-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor #29162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug that occurs on B200 hardware when using DeepGEMM in combination with EPLB, where transposed, non-contiguous MoE weight scale tensors cause an assertion failure. The fix involves changing the tensor's view to be contiguous before it's processed by the EPLB logic. The PR also introduces a new test to validate the fix and refactors some distributed testing utilities into a shared file, which is a good improvement. The overall approach is sound and the changes are well-implemented. I have one minor suggestion to improve the clarity of a docstring for future maintainability.
64d9ca1 to
25db0d9
Compare
|
@codex review |
|
cc @elvircrn @ilmarkov @SageMoore @abmfy PTAL! Thanks 🙌 |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
|
Looks good to me! Thank you for the fix! |
|
|
||
| weights = list(self.named_parameters()) | ||
| weights = [(name, _maybe_make_contiguous(name, p)) for name, p in weights] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, instead of is_contiguous check we need to check that we are row-major with num_local_experts in the first dimension. The fact that tensor is contiguous in other dimensions is not important as we flatten the view in those dimensions.
|
Looks like pre-commit is failing on unrelated files in vllm PRs - Saw the same failure here #29188 |
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
ba3820b to
9a1d798
Compare
abmfy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current fix LGTM. Thanks!
|
The point of the assertion is to ensure that |
Purpose
On B200, the following command fails,
On B200, when we use DeepGEMM, we transpose the MoE weight_scale tensors for efficient DeepGEMM matmuls. This in combination with EPLB fails the following assertion,
as the weight scale tensors are not contiguous.
Fix: This PR changes to view of the tensor so the
is_contiguous()check passes. We also add a test to verify that this view update is safe.Test Plan
tests/distributed/test_eplb_fused_moe_layer.pypasseslm-eval test
Test Result
server command +
--enable-eplb --eplb-config '{"window_size":10,"step_interval":100,"num_redundant_experts":0,"log_balancedness":true}'server command (without eplb)